487 research outputs found
Computing Exact Clustering Posteriors with Subset Convolution
An exponential-time exact algorithm is provided for the task of clustering n
items of data into k clusters. Instead of seeking one partition, posterior
probabilities are computed for summary statistics: the number of clusters, and
pairwise co-occurrence. The method is based on subset convolution, and yields
the posterior distribution for the number of clusters in O(n * 3^n) operations,
or O(n^3 * 2^n) using fast subset convolution. Pairwise co-occurrence
probabilities are then obtained in O(n^3 * 2^n) operations. This is
considerably faster than exhaustive enumeration of all partitions.Comment: 6 figure
Bayesian graphical model determination using decision theory
AbstractBayesian model determination in the complete class of graphical models is considered using a decision theoretic framework within the regular exponential family. The complete class contains both decomposable and non-decomposable graphical models. A utility measure based on a logarithmic score function is introduced under reference priors for the model parameters. The logarithmic utility of a model is decomposed into predictive performance and relative complexity. Axioms of decision theory lead to the judgement of the plausibility of a model in terms of the posterior expected utility. This quantity has an analytic expression for decomposable models when certain reference priors are used and the exponential family is closed under marginalization. For non-decomposable models, a simulation consistent estimate of the expectation can be obtained. Both real and simulated data sets are used to illustrate the introduced methodology
On the Outage Capacity of Orthogonal Space-time Block Codes Over Multi-cluster Scattering MIMO Channels
Multiple cluster scattering MIMO channel is a useful model for pico-cellular
MIMO networks. In this paper, orthogonal space-time block coded transmission
over such a channel is considered, where the effective channel equals the
product of n complex Gaussian matrices. A simple and accurate closed-form
approximation to the channel outage capacity has been derived in this setting.
The result is valid for an arbitrary number of clusters n-1 of scatterers and
an arbitrary antenna configuration. Numerical results are provided to study the
relative outage performance between the multi-cluster and the Rayleigh-fading
MIMO channels for which n=1.Comment: Added references; changes made in Section 3-
Labeled Directed Acyclic Graphs: a generalization of context-specific independence in directed graphical models
We introduce a novel class of labeled directed acyclic graph (LDAG) models
for finite sets of discrete variables. LDAGs generalize earlier proposals for
allowing local structures in the conditional probability distribution of a
node, such that unrestricted label sets determine which edges can be deleted
from the underlying directed acyclic graph (DAG) for a given context. Several
properties of these models are derived, including a generalization of the
concept of Markov equivalence classes. Efficient Bayesian learning of LDAGs is
enabled by introducing an LDAG-based factorization of the Dirichlet prior for
the model parameters, such that the marginal likelihood can be calculated
analytically. In addition, we develop a novel prior distribution for the model
structures that can appropriately penalize a model for its labeling complexity.
A non-reversible Markov chain Monte Carlo algorithm combined with a greedy hill
climbing approach is used for illustrating the useful properties of LDAG models
for both real and synthetic data sets.Comment: 26 pages, 17 figure
Kpax3 : Bayesian bi-clustering of large sequence datasets
Motivation: Estimation of the hidden population structure is an important step in many genetic studies. Often the aim is also to identify which sequence locations are the most discriminative between groups of samples for a given data partition. Automated discovery of interesting patterns that are present in the data can help to generate new biological hypotheses. Results: We introduce Kpax3, a Bayesian method for bi-clustering multiple sequence alignments. Influence of individual sites will be determined in a supervised manner by using informative prior distributions for the model parameters. Our inference method uses an implementation of both split-merge and Gibbs sampler type MCMC algorithms to traverse the joint posterior of partitions of samples and variables. We use a large Rotavirus sequence dataset to demonstrate the ability of Kpax3 to generate biologically important hypotheses about differential selective pressures across a virus protein.Peer reviewe
- …